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Abstract 

We  present  a  system  for  automatieally 
identifying  PropBank-style  semantie  roles 
based  on  the  output  of  a  statistieal  parser 
for  Combinatory  Categorial  Grammar. 

This  system  performs  at  least  as  well  as 
a  system  based  on  a  traditional  Treebank 
parser,  and  outperforms  it  on  eore  argu¬ 
ment  roles. 

1  Introduction 

Correctly  identifying  the  semantie  roles  of  sentence 
constituents  is  a  crucial  part  of  interpreting  text,  and 
in  addition  to  forming  an  important  part  of  the  infor¬ 
mation  extraction  problem,  can  serve  as  an  interme¬ 
diate  step  in  machine  translation  or  automatic  sum¬ 
marization.  Even  for  a  single  predicate,  semantic 
arguments  can  have  multiple  syntactic  realizations, 
as  shown  by  the  following  paraphrases: 

(1)  John  will  meet  with  Mary. 

John  will  meet  Mary. 

John  and  Mary  will  meet. 

(2)  The  door  opened. 

Mary  opened  the  door. 

Recently,  attention  has  turned  to  creating  cor¬ 
pora  annotated  with  argument  structures.  The 
PropBank  (Kingsbury  and  Palmer,  2002)  and  the 
FrameNet  (Baker  et  al.,  1998)  projects  both  doc¬ 
ument  the  variation  in  syntactic  realization  of  the 
arguments  of  predicates  in  general  English  text. 


Gildea  and  Palmer  (2002)  developed  a  system  to 
predict  semantic  roles  (as  defined  in  PropBank)  from 
sentences  and  their  parse  trees  as  determined  by  the 
statistical  parser  of  Collins  (1999).  In  this  paper,  we 
examine  how  the  syntactic  representations  used  by 
different  statistical  parsers  affect  the  performance 
of  such  a  system.  We  compare  a  parser  based  on 
Combinatory  Categorial  Grammar  (CCG)  (Hocken¬ 
maier  and  Steedman,  2002b)  with  the  Collins  parser. 
As  the  CCG  parser  is  trained  and  tested  on  a  cor¬ 
pus  of  CCG  derivations  that  have  been  obtained  by 
automatic  conversion  from  the  Penn  Treebank,  we 
are  able  to  compare  performance  using  both  gold- 
standard  and  automatic  parses  for  both  CCG  and  the 
traditional  Treebank  representation.  The  Treebank- 
parser  returns  skeletal  phrase-structure  trees  with¬ 
out  the  traces  or  functional  tags  in  the  original  Penn 
Treebank,  whereas  the  CCG  parser  returns  word- 
word  dependencies  that  correspond  to  the  under¬ 
lying  predicate-argument  structure,  including  long- 
range  dependencies  arising  through  control,  raising, 
extraction  and  coordination. 

2  Predicate-argument  relations  in 
PropBank 

The  Proposition  Bank  (Kingsbury  and  Palmer, 
2002)  provides  a  human-annotated  corpus  of 
semantic  verb-argument  relations.  For  each  verb 
appearing  in  the  corpus,  a  set  of  semantic  roles  is 
defined.  Roles  for  each  verb  are  simply  numbered 
ArgO,  Argl,  Arg2,  etc.  As  an  example,  the  entry- 
specific  roles  for  the  verb  offer  are  given  below: 
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ArgO  entity  offering 
Argl  eommodity 
Arg2  priee 

Arg3  benefaetive  or  entity  offered  to 

These  roles  are  then  annotated  for  every  instanee 
of  the  verb  appearing  in  the  eorpus,  ineluding  the 
following  examples: 

(3)  [argo  the  eompany]  to  offer  [argi  a  15%  stake] 
to  [arg2  the  publie], 

(4)  [argo  Sotheby’s]  ...  offered  [arg2  the  Dorranee 
heirs]  [argi  a  money-baek  guarantee] 

(5)  [argi  an  amendment]  offered  by  [argo  Rep. 
Peter  DeFazio] 

(6)  [arg2  Subeontraetors]  will  be  offered  [argi  a 
settlement] 

A  variety  of  additional  roles  are  assumed 
to  apply  aeross  all  verbs.  These  seeondary 
roles  ean  be  thought  of  as  being  adjunets, 
rather  than  arguments,  although  no  elaims  are 
made  as  to  optionality  or  other  traditional  argu- 
ment/adjunet  tests.  The  seeondary  roles  inelude: 


Loeafion 

in  Tokyo,  outside 

Time 

last  week,  on  Tuesday,  never 

Manner 

easily,  dramatically 

Direefion 

south,  into  the  wind 

Cause 

due  to  pressure  from  Washington 

Diseourse 

however,  also,  on  the  other  hand 

Exfenf 

15%,  289  points 

Purpose 

to  satisfy  requirements 

Negafion 

not,  n’t 

Modal 

can,  might,  should,  will 

Adverbial 

(none  of  the  above) 

and  are  represented  in  PropBank  as  “ArgM”  with  an 
additional  funetion  tag,  for  example  ArgM-TMP  for 
temporal.  We  refer  to  PropBank’s  numbered  argu¬ 
ments  as  “eore”  arguments.  Core  arguments  repre¬ 
sent  75%  of  the  total  labeled  roles  in  the  PropBank 
data.  Our  system  prediets  all  the  roles,  ineluding 
eore  arguments  as  well  as  the  ArgM  labels  and  their 
funetion  tags. 

3  Predicate-argument  relations  in  CCG 

Combinatory  Categorial  Grammar  (CCG)  (Steed- 
man,  2000),  is  a  grammatieal  theory  whieh  provides 


a  eompletely  transparent  interfaee  between  surfaee 
syntax  and  underlying  semanties,  sueh  that  eaeh 
syntaetie  derivation  eorresponds  direetly  to  an  in¬ 
terpretable  semantie  representation  whieh  ineludes 
long-range  dependeneies  that  arise  through  eontrol, 
raising,  eoordination  and  extraetion. 

In  CCG,  words  are  assigned  atomie  eate- 
gories  sueh  as  NP,  or  funetor  eategories  like 
(S[dcl]\NP)/NP  (transitive  deelarative  verb)  or  S/S 
(sentential  modifier).  Adjunets  are  represented 
as  funetor  eategories  sueh  as  S/S  whieh  expeet 
and  return  the  same  type.  We  use  indiees  to 
number  the  arguments  of  funetor  eategories,  eg. 
(S[dcl]\NPi)/NP2,  or  S/Si,  and  indieate  the  word- 
word  dependeneies  in  the  predieate-argument  strue- 
ture  as  tuples  {wh,  Ch,  i,  Wa),  where  Ch  is  the  lexieal 
eategory  of  the  head  word  Wh,  and  Wa  is  the  head 
word  of  the  eonstituent  that  fills  fhe  hh  argumenf  of 
Ch- 

Long-range  dependeneies  ean  be  projeefed 
fhrough  eerfain  types  of  lexieal  eafegories  or 
fhrough  rules  sueh  as  eoordination  of  funelor 
eafegories.  For  example,  in  fhe  lexieal  eafegory  of  a 
relative  pronoun,  (NP\NPi) / (S[dcl]/NPi),  fhe  head 
of  fhe  NP  fhaf  is  missing  from  fhe  relative  elause 
is  unified  wifh  (as  indieafed  by  fhe  indiees  i)  fhe 
head  of  fhe  NP  fhaf  is  modified  by  fhe  entire  relafive 
elause. 

Figure  1  shows  fhe  derivations  of  an  ordinary 
senfenee,  a  relafive  elause  and  a  righf-node -raising 
eonsfruefion.  In  all  fhree  senfenees,  fhe  predieafe- 
argumenf  relafions  befween  London  and  denied  and 
plans  and  denied  are  fhe  same,  whieh  in  CCG  is 
expressed  by  fhe  fael  fhaf  London  fills  fhe  firsl  (ie. 
subjeef)  argumenf  slof  of  fhe  lexieal  eafegory  of  de¬ 
nied,  (S[dcl]\NPi)/NP2,  and  plans  fills  fhe  seeond 
(objeef)  slof.  The  relafions  exfraefed  from  fhe  CCG 
derivation  for  fhe  senfenee  “London  denied  plans  on 
Monday  ”  are  shown  in  Table  1 . 

The  CCG  parser  refurns  fhe  loeal  and  long-range 
word-word  dependeneies  fhaf  express  fhe  predieafe- 
argumenf  sfruefure  eorresponding  fo  fhe  derivafion. 
These  relafions  are  reeovered  wifh  an  aeeuraey  of 
around  83%  (labeled  reeovery)  or  91%  (unlabeled 
reeovery)  (Hoekenmaier,  2003).  By  eonfrasf,  sfan- 
dard  Treebank  parsers  sueh  as  (Collins,  1999)  only 
refurn  phrase-sfruefure  frees,  from  whieh  non-loeal 
dependeneies  are  diffieuh  fo  reeover. 
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Figure  1:  CCG  derivation  trees  for  three  elauses  eontaining  the  same  predieate-argument  relations. 


Wh  Ch _ i  Wg 

denied  (S[dcl]\NPi)/NP2  1  London 

denied  (S[dcl]\NPi)/NP2  2  plans 

on  ((S\NPi)\(S\NP)2)/NP3  2  denied 

on  ((S\NPi)\(S\NP)2)/NP3  3  Monday 

Table  1:  CCG  predieate-argument  relations  for  the 
sentenee  “London  denied  plans  on  Monday  ” 

The  CCG  parser  has  been  trained  and  tested  on 
CCGbank  (Hoekenmaier  and  Steedman,  2002a),  a 
treebank  of  CCG  derivations  obtained  from  the  Penn 
Treebank,  from  whieh  we  also  obtain  our  training 
data. 

4  Mapping  between  PropBank  and 
CCGbank 

Our  aim  is  to  use  CCG  derivations  as  input  to  a  sys¬ 
tem  for  automatieally  produeing  the  argument  labels 
of  PropBank.  In  order  to  do  this,  we  wish  to  eor- 
relate  the  CCG  relations  above  with  PropBank  ar¬ 
guments.  PropBank  argument  labels  are  assigned 


to  nodes  in  the  syntaetie  trees  from  the  Penn  Tree- 
bank.  While  the  CCGbank  is  derived  from  the  Penn 
Treebank,  in  many  eases  the  eonstituent  struetures 
do  not  eorrespond.  That  is,  there  may  be  no  eon¬ 
stituent  in  the  CCG  derivation  eorresponding  to  the 
same  sequenee  of  words  as  a  partieular  eonstituent 
in  the  Treebank  tree.  For  this  reason,  we  eompute 
the  eorrespondenee  between  the  CCG  derivation  and 
the  PropBank  labels  at  the  level  of  head  words.  For 
eaeh  role  label  for  a  verb’s  argument  in  PropBank, 
we  first  find  the  head  word  for  its  eonstituent  aeeord- 
ing  to  the  the  head  rules  of  (Collins,  1999).  We  then 
look  for  the  label  of  the  CCG  relation  between  this 
head  word  and  the  verb  itself. 

5  The  Experiments 

In  previous  work  using  the  PropBank  eorpus, 
Gildea  and  Palmer  (2002)  developed  a  system  to 
prediet  semantie  roles  from  sentenees  and  their 
parse  trees  as  determined  by  the  statistieal  parser  of 
Collins  (1999).  We  will  briefly  review  their  proba¬ 
bility  model  before  adapting  the  system  to  ineorpo- 


rate  features  from  the  CCG  derivations. 

5.1  The  model  of  Gildea  and  Palmer  (2002) 

For  the  Treebank-based  system,  we  use  the  proba¬ 
bility  model  of  Gildea  and  Palmer  (2002).  Proba¬ 
bilities  of  a  parse  constituent  belonging  to  a  given 
semantic  role  are  calculated  from  the  following  fea¬ 
tures: 

The  phrase  type  feature  indicates  the  syntactic 
type  of  the  phrase  expressing  the  semantic  roles:  ex¬ 
amples  include  noun  phrase  (NP),  verb  phrase  (VP), 
and  clause  (S). 

The  parse  tree  path  feature  is  designed  to  capture 
the  syntactic  relation  of  a  constituent  to  the  pred¬ 
icate.  It  is  defined  as  the  path  from  the  predicate 
through  the  parse  tree  to  the  constituent  in  question, 
represented  as  a  string  of  parse  tree  nonterminals 
linked  by  symbols  indicating  upward  or  downward 
movement  through  the  tree,  as  shown  in  Figure  2. 
Although  the  path  is  composed  as  a  string  of  sym¬ 
bols,  our  systems  will  treat  the  string  as  an  atomic 
value.  The  path  includes,  as  the  first  element  of  the 
string,  the  part  of  speech  of  the  predicate,  and,  as  the 
last  element,  the  phrase  type  or  syntactic  category  of 
the  sentence  constituent  marked  as  an  argument. 


Figure  2:  In  this  example,  the  path  from  the  predi¬ 
cate  ate  to  the  argument  NP  He  can  be  represented  as 
VB|VP|SiNP,  with  I  indicating  upward  movement 
in  the  parse  tree  and  [  downward  movement. 

The  position  feature  simply  indicates  whether  the 
constituent  to  be  labeled  occurs  before  or  after  the 
predicate.  This  feature  is  highly  correlated  with 
grammatical  function,  since  subjects  will  generally 
appear  before  a  verb,  and  objects  after.  This  feature 
may  overcome  the  shortcomings  of  reading  gram¬ 
matical  function  from  the  parse  tree,  as  well  as  errors 
in  the  parser  output. 


The  voice  feature  distinguishes  between  active 
and  passive  verbs,  and  is  important  in  predicting  se¬ 
mantic  roles  because  direct  objects  of  active  verbs 
correspond  to  subjects  of  passive  verbs.  An  instance 
of  a  verb  was  considered  passive  if  it  is  tagged  as 
a  past  participle  (e.g.  taken),  unless  it  occurs  as  a 
descendent  verb  phrase  headed  by  any  form  of  have 
(e.g.  has  taken)  without  an  intervening  verb  phrase 
headed  by  any  form  of  be  (e.g.  has  been  taken). 

The  head  word  is  a  lexical  feature,  and  provides 
information  about  the  semantic  type  of  the  role  filler. 
Head  words  of  nodes  in  the  parse  tree  are  determined 
using  the  same  deterministic  set  of  head  word  rules 
used  by  Collins  (1999). 

The  system  attempts  to  predict  argument  roles 
in  new  data,  looking  for  the  highest  probabil¬ 
ity  assignment  of  roles  to  all  constituents  i 
in  the  sentence,  given  the  set  of  features  F)  = 
{pti,pathi,posi,Vi,  hi}  at  each  constituent  in  the 
parse  tree,  and  the  predicate  p: 

argmaXri„„P{ri„n\Fi„n,p) 

We  break  the  probability  estimation  into  two 
parts,  the  first  being  the  probability  P{ri\Fi,p)  of 
a  constituent’s  role  given  our  five  feafures  for  fhe 
consifuenf,  and  fhe  predicafe  p.  Due  fo  fhe  sparsify 
of  fhe  dafa,  if  is  nof  possible  fo  esfimafe  fhis  proba- 
bilify  from  fhe  counfs  in  fhe  fraining  dafa.  Insfead, 
probabilities  are  esfimafed  from  various  subsefs  of 
fhe  feafures,  and  inferpolafed  as  a  linear  combina¬ 
tion  of  fhe  resulfing  disfribufions.  The  inferpolafion 
is  performed  over  fhe  mosf  specific  disfribufions  for 
which  dafa  are  available,  which  can  be  fhoughf  of  as 
choosing  fhe  fopmosf  disfribufions  available  from  a 
backoff  laffice,  shown  in  Figure  3. 


Figure  3:  Backoff  lattice  wifh  more  specific  disfri¬ 
bufions  towards  fhe  fop. 


The  probabilities  P{ri\Fi,p)  are  eombined  with 
the  probabilities  P({ri  ,„}|p)  for  a  set  of  roles  ap¬ 
pearing  in  a  sentence  given  a  predicate,  using  the 
following  formula: 

P{ri..n\Fl..n,p)  ^  P{{ri..n}\p)  H 

This  approach,  described  in  more  detail  in 
Gildea  and  Jurafsky  (2002),  allows  interaction  be¬ 
tween  the  role  assignments  for  individual  con¬ 
stituents  while  making  certain  independence  as¬ 
sumptions  necessary  for  efficient  probability  estima¬ 
tion.  In  particular,  we  assume  that  sets  of  roles  ap¬ 
pear  independent  of  their  linear  order,  and  that  the 
features  F  of  a  constituents  are  independent  of  other 
constituents’  features  given  the  constituent’s  role. 

5.2  The  model  for  CCG  derivations 

In  the  CCG  version,  we  replace  the  features  above 
with  corresponding  features  based  on  both  the  sen¬ 
tence’s  CCG  derivation  tree  (shown  in  Figure  1) 
and  the  CCG  predicate-argument  relations  extracted 
from  it  (shown  in  Table  1). 

The  parse  tree  path  feature,  designed  to  capture 
grammatical  relations  between  constituents,  is  re¬ 
placed  with  a  feature  defined  as  follows:  If  fhere  is 
a  dependency  in  fhe  predicafe-argumenf  sfrucfure  of 
fhe  CCG  derivation  befween  fwo  words  w  and  w', 
fhe  pafh  fealure  from  w  fo  w'  is  defined  as  fhe  lexical 
cafegory  of  fhe  functor,  fhe  argumenf  slof  i  occupied 
by  fhe  argumenf,  plus  an  arrow  (^  or  fo  indicafe 
whefher  w  or  w'  is  fhe  cafegorial  functor.  For  exam¬ 
ple,  in  our  sentence  “London  denied  plans  on  Mon¬ 
day”,  fhe  relafion  connecfing  fhe  verb  denied  wifh 
plans  is  (S[dcl]\NP)/NP.2.^,  wifh  fhe  lefl  arrow 
indicating  fhe  lexical  cafegory  included  in  fhe  rela- 
fion  is  fhaf  of  fhe  verb,  while  fhe  relation  connecting 
denied  wifh  on  is  ((S\NP)\(S\NP))/NP.2.^,  wifh 
fhe  righf  arrow  indicafing  fhe  fhe  lexical  cafegory  in¬ 
cluded  in  fhe  relafion  is  fhaf  of  fhe  modifier. 

If  fhe  CCG  derivafion  does  nol  define  a  predicafe- 
argumenf  relafion  befween  fhe  fwo  words,  we  use 
fhe  parse  free  pafh  feafure  described  above,  defined 
over  fhe  CCG  derivafion  free.  In  our  fraining  dafa, 
77%  of  PropBank  argumenfs  corresponded  direcfly 
fo  a  relafion  in  fhe  CCG  predicafe-argumenf  repre- 
senfafion,  and  fhe  pafh  feafure  was  used  for  fhe  re¬ 


maining  23%.  Mosf  of  fhese  mismatches  arise  be¬ 
cause  the  CCG  parser  and  PropBank  differ  in  their 
definition  of  head  words.  For  instance,  the  CCG 
parser  always  assumes  that  the  head  of  a  PP  is 
the  preposition,  whereas  PropBank  roles  can  be  as¬ 
signed  to  the  entire  PP  (7),  or  only  to  the  NP  argu¬ 
ment  of  the  preposition  (8),  in  which  case  the  head 
word  comes  from  the  NP: 

(7)  ...  will  be  offered  [PPargm-loc  in  the  17. 5]. 

(8)  to  offer  ...[PP  to  [NParg2  the  public]]. 

In  embedded  clauses,  CCG  assumes  that  the  head  is 
the  complementizer,  whereas  in  PropBank,  the  head 
comes  from  the  embedded  sentence  itself.  In  com¬ 
plex  verb  phrases  (eg.  “might  not  have  gone  ”),  the 
CCG  parser  assumes  that  the  first  auxiliary  {might) 
is  head,  whereas  PropBank  assumes  it  is  the  main 
verb  (gone).  Therefore,  CCG  assumes  that  not  mod¬ 
ifies  might,  whereas  PropBank  assumes  if  modi¬ 
fies  gone.  Alfhough  fhe  head  rules  of  fhe  parser 
could  in  principle  be  changed  fo  reflecf  more  di¬ 
recfly  fhe  dependencies  in  PropBank,  we  have  nol 
aflempled  fo  do  so  yel.  Furlher  mismafches  occur 
because  fhe  predicafe-argumenf  sfrucfure  relumed 
by  fhe  CCG  parser  only  conlains  synlaclic  depen¬ 
dencies,  whereas  fhe  PropBank  dafa  also  conlain 
some  anaphoric  dependencies,  eg. : 

(9)  [argo  Realisl ’s]  negolialions  fo  acquire 
Ammann  Laser  Technik  AG... 

(10)  When  properly  applied,  [argo  the  adhesive]  is 
designed  to... 

Such  dependencies  also  do  nol  correspond  to  a  rela¬ 
tion  in  fhe  predicafe-argumenf  sfrucfure  of  fhe  CCG 
derivafion,  and  cause  fhe  pafh  fealure  to  be  used. 

The  phrase  type  fealure  is  replaced  wifh  fhe  lex¬ 
ical  cafegory  of  fhe  maximal  projection  of  fhe  Prop- 
Bank  argumenl’s  head  word  in  fhe  CCG  derivation 
free.  For  example,  fhe  cafegory  of  plans  is  N,  and 
fhe  category  of  denied  is  (S[dcl]\NP)/NP. 

The  voice  fealure  can  be  read  off  fhe  CCG  cate¬ 
gories,  since  fhe  CCG  calegories  of  pasl  parliciples 
carry  differenl  fealures  in  active  and  passive  voice 
(eg.  sold  can  be  (S[pt]\NP)/NP  or  S[pss]\NP). 

The  head  word  of  a  consliluenl  is  indicaled  in  fhe 
derivations  relumed  by  fhe  CCG  parser. 
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Figure  4:  A  sample  sentenee  as  produeed  by  the  Treebank  parser  (left)  and  by  the  CCG  parser  (right).  Nodes 
are  annotated  with  PropBank  roles  ARGO,  ARGI  and  ARGM-TMP. 


Treebank-based 

CCG-based 

Features  extracted  from 

Args 

Precision 

Recall 

F- 

■score 

Precision 

Recall 

F-score 

Automatie  parses 

eore 

75.9 

69.6 

72.6 

76.1 

73.5 

74.8 

all 

72.6 

61.2 

66.4 

71.0 

63.1 

66.8 

Gold-standard  parses 

eore 

85.5 

81.7 

83.5 

82.4 

78.6 

80.4 

all 

78.8 

69.9 

74.1 

16.3 

67.8 

71.8 

Gold-standard  w/o  traees 

eore 

77.6 

75.2 

76.3 

all 

74.4 

66.5 

70.2 

Table  2:  Aeeuraey  of  semantie  role  predietion 


5.3  Data 

We  use  data  from  the  November  2002  release  of 
PropBank.  The  dataset  eontains  annotations  for 
72,109  predieate-argument  struetures  with  190,815 
individual  arguments  (of  whieh  75%  are  eore,  or 
numbered,  arguments)  and  has  ineludes  examples 
from  2462  lexieal  predieates  (types).  Annotations 
from  Seetions  2  through  21  of  the  Treebank  were 
used  for  training;  Seetion  23  was  the  test  set.  Both 
parsers  were  trained  on  Seetions  2  through  21. 

6  Results 

Beeause  of  the  mismateh  between  the  eonstituent 
struetures  of  CCG  and  the  Treebank,  we  seore  both 
systems  aeeording  to  how  well  they  identify  the  head 
words  of  PropBank’s  arguments.  Table  2  gives  the 
performanee  of  the  system  on  both  PropBank’s  eore, 
or  numbered,  arguments,  and  on  all  PropBank  roles 
ineluding  the  adjunet-like  ArgM  roles.  In  order  to 
analyze  the  impaet  of  errors  in  the  syntaetie  parses, 
we  present  results  using  features  extraeted  from  both 
automatie  parser  output  and  the  gold  standard  parses 
in  the  Penn  Treebank  (without  funetional  tags)  and 
in  CCGbank.  Using  the  gold  standard  parses  pro¬ 


vides  an  upper  bound  on  the  performanee  of  the  sys¬ 
tem  based  on  automatie  parses.  Sinee  the  Collins 
parser  does  not  provide  traee  information,  its  up¬ 
per  bound  is  given  by  the  system  tested  on  the 
gold-standard  Treebank  representation  with  traees 
removed.  In  Table  2,  “eore”  indieates  results  on 
PropBank’s  numbered  arguments  (ARG0...ARG5) 
only,  and  “all”  ineludes  numbered  arguments  as  well 
as  the  ArgM  roles.  Most  of  the  numbered  argu¬ 
ments  (in  partieular  ARGO  and  ARGI)  eorrespond 
to  arguments  that  the  CCG  eategory  of  the  verb  di- 
reetly  subeategorizes  for.  The  CCG-based  system 
outperforms  the  system  based  on  the  Collins  parser 
on  these  eore  arguments,  and  has  eomparable  perfor¬ 
manee  when  all  PropBank  labels  are  eonsidered.  We 
believe  that  the  superior  performanee  of  the  CCG 
system  on  this  eore  arguments  is  due  to  its  ability  to 
reeover  long-distanee  dependeneies,  whereas  we  at¬ 
tribute  its  lower  performanee  on  non-eore  arguments 
mainly  to  the  mismatehes  between  PropBank  and 
CCGbank. 

The  importanee  of  long-range  dependeneies  for 
our  task  is  indieated  by  the  faet  that  the  performanee 
on  the  Penn  Treebank  gold  standard  without  traees 


Treebank-based 

CCG-based 

Scoring 

Precision 

Recall 

F-score 

Precision 

Recall 

F-score 

Automatic  parses 

Head  word 

72.6 

61.2 

66.4 

71.0 

63.1 

66.8 

Boundary 

68.6 

57.8 

62.7 

55.7 

49.5 

52.4 

Gold-standard  parses 

Head  word 

77.6 

75.2 

163 

76.3 

67.8 

71.8 

(Treebank:  w/o  traces) 

Boundary 

74.4 

66.5 

70.2 

67.5 

60.0 

63.5 

Table  3:  Comparison  of  scoring  regimes,  using  automatic  parser  output  and  gold  standard  parses.  The  first 
row  in  this  table  corresponds  to  the  second  row  in  Table  2. 


is  significantly  lower  than  that  on  the  Penn  Treebank 
with  trace  information.  Long-range  dependencies 
are  especially  important  for  core  arguments,  shown 
by  the  fact  that  removing  trace  information  from  the 
Treebank  parses  results  in  a  bigger  drop  for  core 
arguments  (83.5  to  76.3  F-score)  than  for  all  roles 
(74.1  to  70.2).  The  ability  of  the  CCG  parser  to  re¬ 
cover  these  long-range  dependencies  accounts  for  its 
higher  performance,  and  in  particular  its  higher  re¬ 
call,  on  core  arguments. 

The  CCG  gold  standard  performance  is  below 
that  of  the  Penn  Treebank  gold  standard  with  traces. 
We  believe  this  performance  gap  to  be  caused  by 
the  mismatches  between  the  CCG  analyses  and  the 
PropBank  annotations  described  in  Section  5.2.  For 
the  reasons  described,  the  head  words  of  the  con¬ 
stituents  that  have  PropBank  roles  are  not  necessar¬ 
ily  the  head  words  that  stand  in  a  predicate-argument 
relation  in  CCGbank.  If  two  words  do  not  stand  in  a 
predicate-argument  relation,  the  CCG  system  takes 
recourse  to  the  path  feature.  This  feature  is  much 
sparser  in  CCG:  since  CCG  categories  encode  sub¬ 
categorization  information,  the  number  of  categories 
in  CCGbank  is  much  larger  than  that  of  Penn  Tree- 
bank  labels.  Analysis  of  our  system’s  output  shows 
that  the  system  trained  on  the  Penn  Treebank  gold 
standard  obtains  55.5%  recall  on  those  relations  that 
require  the  CCG  path  feature,  whereas  the  system 
using  CCGbank  only  achieves  36.9%  recall  on  these. 
Also,  in  CCG,  the  complement-adjunct  distinction 
is  represented  in  the  categories  for  the  complement 
(eg.  PP)  or  adjunct  (eg.  (S\NP)\(S\NP)  and  in 
the  categories  for  the  head  (eg.  (S[dcl]\NP)/PP 
or  S[dcl]\NP).  In  generating  the  CCGbank,  various 
heuristics  were  used  to  make  this  distinction.  In  par¬ 
ticular,  for  PPs,  it  depends  on  the  “closely-related” 
(CLR)  function  tag,  which  is  known  to  be  unreli¬ 


able.  The  decisions  made  in  deriving  the  CCGbank 
often  do  not  match  the  hand- annotated  complement- 
adjunct  distinctions  in  PropBank,  and  this  inconsis¬ 
tency  is  likely  to  make  our  CCGbank-based  features 
less  predictive.  A  possible  solution  is  to  regenerate 
the  CCGbank  using  the  Propbank  annotations. 

The  impact  of  our  head-word  based  scoring  is  an¬ 
alyzed  in  Table  3,  which  compares  results  when  only 
the  head  word  must  be  correctly  identified  (as  in  Ta¬ 
ble  2)  and  to  results  when  both  the  beginning  and 
end  of  the  argument  must  be  correctly  identified  in 
the  sentence  (as  in  Gildea  and  Palmer  (2002)).  Even 
if  the  head  word  is  given  the  correct  label,  the  bound¬ 
aries  of  the  entire  argument  may  be  different  from 
those  given  in  the  PropBank  annotation.  Since  con¬ 
stituents  in  CCGbank  do  not  always  match  those  in 
PropBank,  even  the  CCG  gold  standard  parses  ob¬ 
tain  comparatively  low  scores  according  to  this  met¬ 
ric.  This  is  exacerbated  when  automatic  parses  are 
considered. 

7  Conclusion 

Our  CCG-based  system  for  automatically  labeling 
verb  arguments  with  PropBank- style  semantic  roles 
outperforms  a  system  using  a  traditional  Treebank- 
based  parser  for  core  arguments,  which  comprise 
75%  of  the  role  labels,  but  scores  lower  on  adjunct¬ 
like  roles  such  as  temporals  and  locatives.  The  CCG 
parser  returns  predicate-argument  structures  that  in¬ 
clude  long-range  dependencies;  therefore,  it  seems 
inherently  better  suited  for  this  task.  However,  the 
performance  of  our  CCG  system  is  lowered  by  the 
fact  that  the  syntactic  analyses  in  its  training  corpus 
differ  from  those  that  underlie  PropBank  in  impor¬ 
tant  ways  (in  particular  in  the  notion  of  heads  and  the 
complement-adjunct  distinction).  We  would  expect 
a  higher  performance  for  the  CCG-based  system  if 


the  analyses  in  CCGbank  resembled  more  elosely 
those  in  PropBank. 

Our  results  also  indieate  the  importanee  of  reeov- 
ering  long-range  dependeneies,  either  through  the 
traee  information  in  the  Penn  Treebank,  or  direetly, 
as  in  the  predieate-argument  struetures  returned  by 
the  CCG  parser.  We  speeulate  that  mueh  of  the 
performanee  improvement  we  show  eould  be  ob¬ 
tained  with  traditional  (ie.  non-CCG-based)  parsers 
if  they  were  designed  to  reeover  more  of  the  infor¬ 
mation  present  in  the  Penn  Treebank,  in  partieular 
the  traee  eo-indexation.  An  interesting  experiment 
would  be  the  applieation  of  our  role-labeling  sys¬ 
tem  to  the  output  of  the  traee  reeovery  system  of 
Johnson  (2002).  Our  results  also  have  implieations 
for  parser  evaluation,  as  the  most  frequently  used 
eonstituent-based  preeision  and  reeall  measures  do 
not  evaluate  how  well  long-range  dependeneies  ean 
be  reeovered  from  the  output  of  a  parser.  Measures 
based  on  dependeneies,  sueh  as  those  of  Lin  (1995) 
and  Carroll  et  al.  (1998),  are  likely  to  be  more  rele¬ 
vant  to  real-world  applieations  of  parsing. 
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